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I. INTRODUCTION 

A. Object-oriented programming in scientific 
computing 

A plethora of programming languages are available 
nowadays. While all are of academic interest, at least 
in principle, only a few are used in large scale compu- 
tations, and these can be roughly categorized into two 
groups: those promoting imperative programming (such 
as Fortran and C) and those promoting object-oriented 
programming (OOP). For the usual implementations of 
imperative programming, each statement of the program- 
ming language can be translated into simple machine 
code constructs; while OOP is more abstract by con- 
sidering data as opaque objects, each having a unique 
set of methods to manipulate data. Only a pre-defined 
set of operations are allowed on data, and the internal 
representation of data is not visible to the outside; this 
indirectness has been proven to be very helpful to the 
designing, the writing, and the modifying of the pro- 
gram. OOP was not widely used until the emergence 
of programming languages that support expressing OOP 
concepts natively, most notably C++. 1,2 C++ provides 
both the programming convenience of OOP (and other 
higher-level paradigms) and, most importantly, the run- 
time efficiency of C. C++ makes OOP not only of aca- 
demic interest but also of practical use since the runtime 
efficiency is not compromised much. 

To better distinguish these two paradigms, the con- 
cept of 'coupling' 3-5 need to be introduced. If changing 
a certain part of the program induces the need to change 
another part, these two parts of the program are coupled. 
Since imperative programming is closer to machine code, 
one often manipulates raw data directly. This is inher- 
ently fast, but this also makes the program dependent on 
how the raw data is represented, creating tighter coupling 
between different parts of the program. If the represen- 
tation of the data is to be changed later, a large amount 
of the code need to be changed. On the other hand, OOP 
promotes the separation of interface (what manipulation 
is to be done on the data) from implementation (how the 



data is actually manipulated). As a consequence, good 
OOP design achieves loose coupling and greater ease of 
maintenance. 

The use of OOP in quantum mechanics was pio- 
neered in the work of DFT++ by Ismail-Beigi, 6 and 
since then many other applications have emerged, such 
as S/PHI/nX, 7 JDFTx, 8 and others. Despite the vast 
advantage provided by OOP, a considerable number of 
programs used in computational physics are still writ- 
ten with imperative programming in mind. While the 
runtime efficiency is good, the codes are harder to learn, 
to modify, and to extend. We have recently calculated 
exciton binding energies with time-dependent density- 
functional theory (TDDFT), 9 and we have developed a 
C++ code 10 for this calculation employing OOP. We pro- 
vide a case study of our code to demonstrate how OOP 
is helpful in computational physics, and we thus hope to 
promote its usage. 

B. Background of the physical problem 

We briefly introduce the physical problem here to pro- 
vide a context for our code, while the computational de- 
tails of the problem are described later. Excitons are 
coupled electron-hole pairs in solids, 11,12 and they show 
up in optical absorption spectra as discrete absorption 
peaks below the band gap. Many-body theories such 
as the Bethe-Salpeter equation (BSE) 13,14 describe exci- 
tons well, but the computation is costly, and thus its use 
is limited. An alternative to the BSE is TDDFT, 12 ' 15 
which is a promising excited-state electronic structure 
method widely used for finite systems such as molecules, 
but is gaining popularity for periodic systems. Density 
functional methods balance accuracy and computational 
cost by calculating an auxiliary non-interacting Kohn- 
Sham system that has the same electronic density as the 
real, interacting system. Though TDDFT is formally ex- 
act, one needs to approximate the exchange-correlation 
(xc) many-body effects in practice; numerous approxi- 
mations for the xc potential v xc and for the xc kernel 
/xc = Sv xc /Sn (n is the one-particle density) have been 
successfully applied to describe electronic structure and 
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excitations in materials. 

The excitons have been difficult to obtain from a 
TDDFT calculation. While the main difficulty is an 
unusually stringent requirement on the xc kernel, 16 the 
commonly used calculation approach is also not suitable 
for exciton binding energies. 1 We have developed an al- 
ternative approach for calculating exciton binding ener- 
gies in TDDFT in our recent work. 9 Our code constructs 
and solves an eigenvalue problem as follows (atomic units 
e = h = m e = 1 /47reo = 1 are used throughout the paper 
unless otherwise mentioned): 
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where e are the Kohn-Sham orbital energies of the under- 
lying electronic ground-state calculated beforehand from 
the ABINIT code, 18 i, m denote valence bands, j, n de- 
note conduction bands, the eigenvalue u is the excitation 
frequency, and the corresponding eigenvector p describes 
how single-particle excitations combine into the real ex- 
citation. The Hartree-exchange-correlation (Hxc) matrix 
F HXC = 2Fh + F xc is composed of the Hartree part -Fh 
and the xc part F xc , defined as 
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Here, V is the volume of the crystal, G and G' are the 
reciprocal lattice vectors, k is the wavevector in the first 
Brillouin zone, and k together with band index i, j, m, 
or n specifies the Kohn-Sham orbitals. The approxima- 
tion for the xc kernel f xc is chosen by the user for each 
calculation. 

The formalism above assumes spin-unpolarized sys- 
tems and does not explicitly treat spin. It cannot 
describe spin-flip triplet excitations. Proper treat- 
ment of spin-flip excitations requires the non-collinear 
spin formalism. 12 For spin-unpolarized systems, however, 
there is a shortcut for calculating triplet excitations. We 
define 
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where f xc and fxc are parts of the spin-dependent xc 
kernel, 12 and they need to be approximated as well in 
practice. Solving Eq. (1) with /xc 18 ' ' and / x " plct yields 
singlet and triplet excitations, respectively. 

The G = G' = part (so-called 'head') of f xc requires 
special treatment, since it diverges as q~ 2 as q — > 0. For 
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q + 1 0, the matrix element (jk |e lG r 
(3) becomes (jk |e i(q+G) ' r | ik - q), 
q — > the divergence in / xc is canceled out. 
contribution to Eq. (3) is then calculated as 
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where p is the momentum operator, f is the position op- 
erator, and Vni is the non-local part of the pseudopoten- 
tial. In our study, 9 we are only interested in the exciton 
binding energy instead of the entire optical spectrum, so 
we only need to include a few bands in Eq. (1), which 
simplifies the problem. 

This paper is structured as follows: in Sect. II we ex- 
amine the needs of the exciton project, and we discuss 
the design of our code based on these needs; we then 
show the implementation details in Sect. Ill to demon- 
strate the role of OOP in the development of our code; 
we conclude in Sect. IV. 

Several pseudo-code examples are given in the text; the 
reader is asked to refer to the supplemental material for 
the actual codes. The calculation results were presented 
in Ref. 9 and are not repeated here. 



II. DESIGN 
A. More general remarks on OOP 

Before going into specifics of the design of our code, 
let us discuss a few important features OOP. We men- 
tioned in Sect. I that OOP views data as opaque objects 
that are accessible only through a predefined set of meth- 
ods, through which the visible interface (methods) and 
the invisible implementation (representation of data and 
their manipulation) are separated, allowing one to pro- 
gram without worrying about details. An object has a 
certain predefined type (known as 'class' in C++) that 
defines interface and encapsulates implementation, and 
one class can have more than one object instances. 

Another defining feature of OOP is inheritance and 
polymorphism. Inheritance derives an object type from 
a base type by copying the code of the base type into its 
own definition. This achieves code reuse and uniformity, 
such that a common operation can be defined within a 
base type, and if later it is to be modified in the base 
type, the change automatically applies to all the derived 
types; but more importantly, inheritance models a logi- 
cal connection between types, most commonly an 'is a' 
relation 4 [for example, a long-range correction (LRC) xc 
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bands:BandDataBZ 



I 

psp:Pseudopotential 



chiQ:Prop ChiO qQ sym 
(Kohn Sham linear response function) 



expiqr:Prop ExplGr 
(<zk|e iG ' r |;k>) 



momentum:Prop MomentumMatElem 
(<ik|p|;k>) 



eiqensolvenEiqenSolver 





commutator:Prop Commutator r Vnl 
«ik||r,Vni]|/k» 



V w u 



fmatiHamiltonian 



xc :fxc 



orbden:Prop OrbDen 
(orbital density) 



FIG. 1. UML 19 collaboration dia gram of our code, where every node is an object. Only the most important objects are shown. 
The arrows denote dependence between objects, with the arrowhead side depending on the arrowtail side. Only those objects 
which are directly related to the calculation are shown. 



kernel (derived type) is an xc kernel (base type)]. This 
leads to the so-called polymorphism — an object of the 
derived type can be used anywhere an object of the base 
type is required, since the object of the derived type 'is 
an' object of the base type. 

Polymorphism further decouples interface and imple- 
mentation, so that the base type provides a uniform in- 
terface of methods, and the derived types do the actual 
calculation. For example, calculation of the p'.W k H mnk ) 
matrix elements in Eq. (3) requires the value of the re- 
ciprocal space xc kernel f xc (defined as a base type), 
but this calculation does not need to know either the 
actual choice of / xc or what types of / xc are available 
(which all derive from the common / xc base type and do 
the calculation); since this calculation does not rely on 
details of / xc themselves, one can easily write a new de- 
rived type of / xc and expect it to work like other / X c's, 
without needing to change the code for calculating the 
p(gjk.)(mnk ) ma j. r j x e l em ents. In C++, polymorphism is 
provided through virtual methods. 1,2 

It should be noted that the main purpose of program- 
ming paradigms such as OOP is to provide a framework 
with which programmers can structure, modify, or extend 
their program with ease, and thus increase the produc- 
tivity and the development speed. The runtime efficiency 
is not the primary concern of programming paradigms. 
Object-oriented programs generally produce more ma- 
chine codes for the same task; object-oriented codes are 
harder to optimize by the compiler due to no direct map- 
ping to machine codes; the memory access pattern is 
more difficult to predict — all these issues make object- 



oriented codes run slightly slower than programs written 
with imperative programming languages. The OOP com- 
pensates the runtime efficiency lost with a huge gain in 
development and maintenance efficiency. 

OOP alone does not guarantee good programming 
quality, and one needs to design the object structure with 
care. We identify the following characteristics of our ex- 
citon project 9 that guide the design of our code: this 
project is on a specific problem instead of general quan- 
tum physics; this project has been and will be worked 
on by different people; we do not want to reinvent the 
wheel, and thus we need to use functionalities provided 
by other codes as much as possible; we want to focus on 
the physical problem instead of numerical details, so we 
use specialized libraries; the focus of the project is on 
developing the theory instead of merely on calculating 
numbers. 

With these characteristics in mind, the general require- 
ments on the code are as follows: lightweight, fast to de- 
velop, easy to access and to extend, adaptable to different 
libraries, efficient to run moderate-sized calculations, and 
sufficient runtime flexibility to allow for rapid testing of 
ideas. We show the overall object layout in Fig. 1. 

OOP helps the structuring the program by abstrac- 
tion — the type of data is not determined by the repre- 
sentation, but by the allowed methods manipulating it. 
How the data is represented and how the methods change 
the data are implementation details that should not have 
any influence on the user of the data; thus, the user can 
avoid being sidetracked by details. But a common pitfall 
in the practice of OOP is to overly generalize the object 
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Band Data 

Ground-state data within the 
irreducible Brill ou in zone 



+Ini tia HzeFromFi le ( const s tring&) 

+Outputlnfo(ostream&) const 
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Band Data BZ 

Ground-state data within the full 
Brillouin zone 



BandData ABINIT 

Specifies input from ABINIT file format 



Glist 

List of reciprocal lattice vectors used 
in calculation 



+GetAllowedG(BZk) const 
+GetAUAllowedG() const 
+GetAllowedGPair( )(BZk) const 
+GetAHAllowedGPair( ) const 
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Glist maxGlen 

generate Glist with supplied max length 




Glistecut 

generate Glist with supplied max 
kinetic energy cutoff 





Hamiltonian 

calculates the HXC matrix 



delegate I/O to» 



HamillnOut Adapter 



+read(istream&) 
+print(ostream&) const 



VmatlnOutAdapter 

for Hartree matrix 



XCmatlnOutAdapter 

for xc matrix 



EigVallnOutAdapter 

for eigenvalues 



EigVectlnOutAdapter 

for eigenvectors 



H matin OutAdapter 

for the entire matrix in Eq. (1) 



EigenSolver 



+Solve( needEvec ) const 



LAPACK simple 



LAPACK DAC 



l_ LAPACKexpert 



Properties 



+read( istream&) 
+print{ostream&) const 
+name(i const 



IBZPropertie s 



#bands: const BandData& 



1_ 



BZProperties 



#bands: const BandDataBZ& 



PropExplGr 

calculates <ik\exp(iGr)\jk> 



Prop MomentumMatElem 

calculates <ik\p\jk> 



PropChiOqOsym 

calculates Kohn-Sham response function 
for q=0 



PropOrbDen 

calculates orbital densities 



PropCommutatorrVnl 

calculates <ik\[r,Vn!]\jk> 



Propem 

calculates macroscopic dielectric 
function 



fxc 

xc kernel in reciprocal space 



+fxcQspace(q,G,Gl) const 
+fxcQspace_wing_G_mul_q(q l G) const 
+fxcQspace_wing_Gl_mul_q{q, Gl) const 
+fxcQspace_head_mul_q2(q) const 
+read(istream&) 
+print(ostream&) const 
one ( ) cons t 
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fxc_spin 

spin-dependent xc kernel 



#mode = singlet/triplet 



I 



fxcspinmixed 

use two fxc objects as singlet and 
triplet, respectively 



I 



fxchybridspinbootstrap 

hybrid kernel combining the bootstrap 
kernel and the ALDA kernel 



fxc nothing 

random phase approximation 



fxccontact 

contact interaction kernel 



fxcjrc 

long range correction kerne! 



fxcHartree 

Hartree kernel 



fxc_ALDA_PW92 

adiabatic local density approximation 



fxc_PGG 

linearized optima! effective 
potential/exact exchange kernel 



fxcbootstrap 

bootstrap kernel 



FIG. 2. UML 19 class diagram of the important classes in the exciton project, 
important member variables are shown. 



Only the virtual methods of base classes and 



concept and complicate simple, unambiguous operations. 
Therefore it is important not to lose sight of the goal of 
the code in order to determine the level of abstraction 
and granularity of objects. 

For example, the Hxc matrix F HXC in Eq. (1) con- 
tains a large portion of data which are logically insepa- 
rable. The construction of this matrix involves compli- 
cated calculations and must be done efficiently, and the 
role of the matrix is pre-determined by Eq. (1). Packag- 
ing the involved calculations inside an fmat object is a 
good abstraction, since this decouples it from other im- 
portant operations the code needs to perform — such as 



reading the input files, preparing ground-state orbitals 
and so on, so that these operations can be changed with 
confidence that such manipulations do not affect the cal- 
culation inside fmat. On the other hand, to treat each 
matrix element of -Fhxc as an object would be unneces- 
sarily complicated for the use in our exciton study 9 . 



B. Specifics of the project 

In this project, we quickly found that there was a need 
to develop our own code. Although this is mainly be- 
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cause of the calculation being a new approach for solids 
not available in existing TDDFT codes, another impor- 
tant reason is that the rigid, tightly-coupled structure 
of these codes deters modification and extension. It was 
surprising to see how many changes are needed to the ex- 
isting Fortran-based codes to implement the calculation 
described in Sect. IB. We have been using the ABINIT 
code to calculate the ground-state data, so we use it 
as an example in the following. However, the problem 
described here is general with imperative programming, 
which is prevalent in existing electronic structure codes. 

For example, ABINIT stores the ground-state orbitals 
in multi-dimensional arrays, and this poses three ma- 
jor problems. First, the source code itself (aside from 
comments) does not tell the meaning of each dimen- 
sion, because as an imperative programming language 
Fortran is close to machine code, and therefore lacks 
self-documenting ability. Second, subroutines like nor- 
malization, symmetry operation, calculation of density 
and so on are logically related to the manipulation of or- 
bitals, but the source code does not have the means for 
representing this relation. This information can only be 
discovered by reading other parts of the code or the doc- 
umentation, during which more questions may emerge. 
Also, the compiler is not able to check this logical con- 
nection to ensure these operations are applied to valid 
data. Third, all of the raw data are accessible at once, 
and this risks introducing errors in the code accidentally. 
For example, suppose a subroutine needs to change a 
certain part of the data, but a programming bug affects 
other parts of the data as well; the compiler cannot detect 
this bug since nothing in the program explicitly forbids 
it. Such a bug can remain hidden for a long time. 

As a consequence, one needs to have knowledge of a 
great portion of the code to be able to maintain even a 
small part of it. Though not necessarily a problem for the 
original developers, this tightly-coupled structure makes 
the code hard to learn and to modify for users with par- 
ticular needs. In comparison, with good OOP design, 
the data is accessed through specialized methods with 
explicit meaning, the related operations on the data be- 
come inseparable parts of the object interface, and access 
privileges to the data are differentiated for different parts 
of the program. These features not only make the struc- 
ture of the program clearer, but also make it possible to 
perform more error checking at compile-time rather than 
at runtime, since complex relationships between data can 
be represented in a more explicit manner. 

We take advantage of polymorphism to provide exten- 
sibility, and we demonstrate the extensibility here by ex- 
amples. The inheritance hierarchy of our code is shown 
in Fig. 2. We reuse ground-state data from other codes, 
so we package these data in the BandData class. To avoid 
being locked into a certain ground-state code, the base 
BandData class defines a virtual InitializeFromFile 
method, and it relies on derived classes to do the actual 
data input. We have defined a BandData_ABINIT to read 
the output of the ABINIT code, which is a pseudopoten- 



tial code; changing to other band structure codes such as 
the full-potential ELK code 20 are possible in the future. 
This will only require deriving a new class from BandData 
implementing InitializeFromFile. 

Programming paradigms like OOP facilitate the devel- 
opment of the source code, and the efficiency of the com- 
piled code is not the primary concern of the paradigms. 
The indirectness provided by OOP makes the structure 
of the program clearer, but it also incurs runtime cost: 
not only extra machine instructions, but also a poten- 
tial decrease in cache-efficiency. To avoid such cost, we 
use special libraries such as ScaLAPACK 21 and FFTW 22 
for numerical calculations. The interfaces to these li- 
braries are more suitable for imperative programming, 
and we package the actual calls to these libraries inside 
adapter objects so that we can still benefit from OOP. 
This also provides the possibility to change to other li- 
braries if needed — the adapter defines a uniform inter- 
face, and we only need to provide an implementation of 
this interface using the desired library, without worrying 
about the other parts of the code. 

The main purpose of our code is to help our ongoing 
theoretical studies of TDDFT. Unlike general-purpose 
codes that perform established calculations, the type of 
calculations that our code performs need to change from 
time to time. We provide the runtime flexibility in several 
ways. Instead of hard-coding specific pieces of informa- 
tion, we make our code read as much input as possible at 
runtime to avoid re-compiling and to achieve data reuse. 
Making many functions depend on user input also calls 
for structured exception handling to detect human error 
and to allow for possible recovery from errors. 

For example, we have calculated the exciton binding 
energies with different / xc in Eq. (3) for the same mate- 
rial. Some data like the matrix element (jk |e lG r | ik) do 
not change between such calculations, and having to re- 
calculate it every time would be wasteful. In most cases 
we only need the exciton binding energy as the output, 
but occasionally we also need to calculate the entire spec- 
trum from the eigenvectors of Eq. (1). The spectrum 
does not need to be calculated every time, and it is use- 
ful to allow its separate calculation after a calculation of 
only the binding energy, without having to diagonalize 
the -F H xc matrix in Eq. (1) again. 

We solve these problems by exporting the in- 
put/output (I/O) option of each object to the user and 
by employing the CH — h exception handling mechanism. 
The output that the user requires determines what cal- 
culations are carried out, so that no extra calculation 
would be necessary; when a specific object is required 
during the calculation, the program searches for the user- 
specified input data file for this object and attempts to 
reuse the data; if the data file is corrupt or non-existent, 
an C++ exception is thrown, and the exception handler 
in the main program allows for recovery by calculating 
this part of data from scratch. The detailed implemen- 
tation will be discussed below in Sect. III. 
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III. IMPLEMENTATION 

The calculation performed by our program is shown in 
Fig. 1. The program starts by reading the the ground- 
state data in the irreducible Brillouin zone and pseudopo- 
tcntials (psp); the ground-state data is then extended 
to the full Brillouin zone by symmetry operations and 
stored in bands. Properties required for later calcula- 
tion are then calculated beforehand in parallel, expigr, 
momentum, commutator are always calculated since they 
are required in the calculation of Eq. (3) and Eq. (5); 
other properties are only calculated when the specified 
f xc needs them. The calculation of the xc kernel f xc is 
then carried out in xc with required properties and the 
ground state data. Then, fmat calculates the matrices 
in Eq. (2) (V matrix) and Eq. (3) (XC matrix), com- 
bines them into the left-hand side matrix in Eq. (1) (H 
matrix), and uses the provided eigensolver to solve Eq. 
(1). In the following, we describe certain details of the 
implementation of our code in order to demonstrate the 
benefits of OOP. 



A. BandData 

The BandData class holds all the ground-state data 
that is obtained from other codes. The class interface 
does not expose any raw data, but it contains methods for 
getting the values of the required data, such as the lattice 
vectors (in real or reciprocal space), the volume of the 
unit cell, the range of bands included in calculation, the 
orbital energies, the Bloch functions and so on. This not 
only disallows outside code to change the data, but also 
allows these methods to perform additional operations 
aside from getting the data. 

For example, after the first executable version of our 
code was finished, we implemented an additional scissor 
correction within the method for getting band energies 
to shift the conduction band energies by a certain value. 
Other parts of the code that use band energies continue 
to use this method as in the previous version and require 
no changes. By contrast, one can create a new subroutine 
for this correction in imperative programming, but one is 
then forced to change the other parts of the code to en- 
sure compatibility, which is both tedious and error-prone. 
One can also create a subroutine for getting the band 
energies from the beginning, and implement this correc- 
tion within this subroutine; but this not only requires 
planning long before its actual use, but also requires the 
programmer to be aware of this subroutine and actively 
use it instead of using the raw data. Since imperative 
programming cannot represent the relation between this 
subroutine and the data, the burden of knowing the de- 
tailed structure of the program is on the programmer. 

To decouple the structure of our code from any spe- 
cific ground-state code, the BandData class (Fig. 2) does 
not implement the reading of actual data from provided 
files, but it defines the abstract InitializeFromFile 



method and relies on derived classes to implement 
the method. In our recent study, 9 we have used the 
ABINIT code 18 for the ground-state calculation, and 
we derive the class BandData_ABINIT, which implements 
InitializeFromFile to read the data files produced by 
ABINIT. We plan to use the ELK code 20 in the future. 
Although ELK uses a different basis set (linearized aug- 
mented plane waves instead of ordinary plane waves), we 
can hide these details in the new derived class which con- 
verts the basis set, so that other parts of the code do not 
need to know how the raw data is represented. 

The ground-state band-structure codes usually use 
symmetry of the system to simplify the calculation; in our 
case the ground-state data only contains the k-points in 
the symmetry-reduced (irreducible) Brillouin zone. The 
TDDFT calculation Eq. (1) needs the entire Brillouin 
zone, however. Also, different parts of the program may 
need to work with different numbers of bands or fc-points: 
the fmat object may use fewer bands than the xc object 
for faster calculation. We derive the BandDataBZ class 
from BandData for two purposes: to generate symmetry- 
related data and to make a selected part of the band 
viewable. 

For the first purpose, BandDataBZ is mostly an exten- 
sion to the BandData interface, providing more methods 
to access the symmetry-related data, and inheritance al- 
lows the code of general methods (such as getting the 
lattice vector) to be reused. For the second purpose, 
the methods for getting the band range and number of 
k-points defined in BandData now returns values suit- 
able for the selection, so that user classes of BandDataBZ 
cannot distinguish whether the entire band or only a 
part is used, allowing them to be treated uniformly. As 
shown before, this loose-coupling between components 
helps modification and extension of the program. 

B. Properties 

While in principle all calculations can be done only 
with the data in the BandData class, some specific cal- 
culations occur frequently, so we encapsulate them in 
classes derived from Properties to allow data reuse and 
save time. The Properties interface only contains ab- 
stract I/O functions (read and print) and a name func- 
tion (Fig. 2), with the intent that the required properties 
are calculated together and the I/O of them are done to- 
gether in the main program. C++ uses streams to do 
I/O, and one can provide custom extraction (>>) and in- 
sertion («) operators to use user-defined classes together 
with streams. With the help of the abstract I/O func- 
tions of Properties, the extraction and insertion oper- 
ators can be implemented by the following pseudo-code: 

istreamfe operator» (istreamfe is, Propertiesfe prop) 
{ 

string tmp_name ; 
is >> tmp_name; 
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if (tmp_name != prop. name ()) // wrong file 
throw runtime_error (/*error message*/); 

return prop . read(is) ; 

} 

ostreamfe operator<<(ostream& os, 
const Propertiesfe prop) 

{ 

os << prop. name (); 
return prop. print (os) ; 

} 

We use C++ standard exception handling here, so in the 
main program doing I/O we can recover from reading a 
corrupted file: 

try 
{ 

file >> prop; 

} 

catch(const runtime_error& err) 
{ 

if (err. what ()==. . ./*I/0 error message*/) 
. . . /*construct prop from beginning 
instead of from file*/ 
else throw; 

} 

This has the advantage over returning an error code, be- 
cause the exception cannot be ignored, and the exception 
handling can happen far away from the actual point of 
error. In the previous example, the error can happen 
both in operator» and in the read method of a de- 
rived class of Properties. For a program using error 
codes, they need to be propagated and checked manu- 
ally at every step, while C++ exceptions propagate auto- 
matically and will terminate the program if not handled. 
The resources are also automatically freed with the help 
of smart pointers. 23 This not only makes the program 
more structured (by grouping error handling together), 
but also automatically prevents the program from being 
in an inconsistent state when an error occurs. 

Properties is meant to hold frequently used interme- 
diate results, such as (jk |e lG ' r | ik) in Eq. (3), calcu- 
lated by the Prop_ExpIGr class. Therefore it is crucial 
that these results can be accessed efficiently. We heavily 
use the C++'s standard template library (STL) 24 as a 
way of organizing data, so that the time and space cost 
for accessing the data is well-defined and easy to change 
for different uses. 

For example, Prop_ExpIGr values are labeled by i, j, 
k, and G. We define a custom expigr_key type to hold 
these values, and use it as the key of the standard map 
container. The map container guarantees fetching and 
inserting value in logarithmic time. Its interface over- 
loads the array access operator (operator [] ), so the 
use resembles the regular arrays. This makes access- 
ing the values both efficient and intuitive. If later we 



need faster access of the results, we can change to use 
the standard unorderedjmap container which guarantees 
constant time access. Only one line of the code needs 
to be changed, and the implementation of Prop_ExpIGr 
does not need to change thanks to the similar interface 
between all standard containers. 

STL also benefits from OOP, in the sense that only the 
interface is visible to the user. Different implementations 
of STL are available: some optimize for speed, some for 
memory usage, and some for thread safety. These imple- 
mentations are completely decoupled from the codes us- 
ing them, so we can change to a different implementation 
without changing anything in our source code. Separat- 
ing interface and implementation is of course not a new 
concept, but OOP makes it easier and safer to use. 

We calculate properties beforehand to exploit paral- 
lelism. We implement two level of parallelism in our 
code, the first being process (an instance of the program 
with its own address space) level implemented with the 
message-passing interface 25 (MPI), and the second being 
thread level implemented with OpenMP 26 compiler di- 
rectives. We presume that communication between pro- 
cesses (which are running on different nodes of a cluster) 
is slow, and thus we minimize the number of communi- 
cations and group communications together. 

The derived classes of Properties involve doing the 
same calculation on different sets of data. We implement 
parallelism accordingly. We generate all the jobs that 
need to be done and pass roughly the same number of 
jobs to each process with MPI, and then use OpenMP 
to parallelize the calculation of the jobs on each process. 
When all calculations are done, we collect the results and 
distribute them to all processes using MPI. OpenMP is a 
standard for specific compiler directives, designed to be 
able to be added to a serial program without changing its 
structure. For non-conforming compilers, these compiler 
directives are ignored, and the program reverts to a serial 
program automatically. 

Unlike the OpenMP parallelism which can be easily 
added, the MPI standard is designed with imperative 
programming in mind. The MPI subroutines (even with 
its C++ binding) operate on raw data. The user is re- 
quired to consider low-level details such as the buffer or 
the memory alignment, and this does not work well to- 
gether with OOP. The Boost:: MPI 2 ' library encapsu- 
lates the low-level MPI subroutines in an object-oriented 
manner. With very few extra programming, Boost : :MPI 
allows objects of user-defined types to be used in the 
same syntax as data of fundamental MPI types, and the 
low-level details involved in using user-defined types are 
determined automatically. The calculation of a derived 
class of Properties is done by the following pseudo-code: 

mpi :: communicator world; 
vector<key> myjob; 

if(myid == ID_Master) 
{ 

vector<key_type> jobs; 



// generate all the jobs on one node 
for( . ../*all possible jobs*/ ) 
jobs .push_back( . ../*one job*/ ) 

vector< vector<key_type> > splitted_jobs ; 

/♦prepare the jobs to be distributed to each 
nodes, store in ' splitted_jobs ' */ 

/* distribute the jobs */ 

mpi : : scatter (world, splitted_jobs ,myjob, ID_Master) 
} // jobs and splitted_jobs are destroyed here 
/* receive the jobs for this node */ 
else mpi : : scatter (world, my job , ID_Master) ; 

size_t pos; 

// DpenMP parallelization for jobs on this node 
#pragma omp parallel for 
for(pos = 0; pos < myjob. sizeO ; ++pos) 

// store result in a container local_result 

DoCalculation (myjob [pos]) ; 

result_type temp; 

// gather results and distribute to all nodes 
mpi :: all_reduce (world, local_result , temp, 
.../*an function object combining results*/); 

local_result . swap (temp) ; 
// now local_result contains all results 

The C++ STL containers as well as user-defined types 
are used directly with Boost: :MPI without exposing the 
underlying raw data. STL does not specify thread-safe 
write access, but thread-safety can be easily achieved 
with OpcnMP synchronizing constructs such as the crit- 
ical region. The pseudo-code for DoCalculation ap- 
peared above is 

void DoCalculation(const key_type& key) 
{ 

/* do the calculation indicated by key, 
store in variable 'result' */ 



// OpenMP critical region 

#pragma omp critical 

{ 

local_result . insert (result) ; 

} 

> 

This guarantees that only one thread writes to the con- 
tainer at a time. Though critical regions are costly, in 
practice we find that the actual calculation takes much 
more time, compared to which the cost of the critical 
region is acceptable. 



C. fxc 

The xc kernel / xc is central to a TDDFT calculation. 
Many approximations have been developed since the ex- 
act /xc is unknown. In our recent study, 9 different / xc 's 
are treated uniformly as in Eq. (3), but we use several 
/xc's which are different in their own details: some only 
need trivial calculation, but some require extra data and 
significant calculation. These makes / xc ideal to benefit 
from polymorphism. 

The abstract fxc base class defines an interface (Fig. 
2) for accessing the value of / xc in reciprocal space (via 
fxcQspace and related methods) and for I/O. The I/O 
of fxc is done similarly as in Properties. It should be 
noted that the polymorphism of fxc can also be achieved 
by function pointers in imperative programming. In- 
stead of an object of fxc type, the calculation of Eq. 
(3) can require several function pointers playing the role 
of fxc: : fxcQspace and related methods. 

The function pointer approach works well for simple 
xc kernels that can be calculated on-the-fiy. However, 
xc kernels requiring heavy calculation (such as the 'boot- 
strap' kernel 28 ) must be calculated beforehand and the 
results must be stored; with function pointers this is only 
achievable by having stronger coupling between different 
parts of the program — one has to find a place to store 
these intermediate data, and thus making it harder to 
maintain. With the object approach, however, interme- 
diate results are conveniently stored in private member 
variables and are concealed from other parts of the pro- 
gram. As shown previously, the major difference between 
OOP and imperative programming implementations is 
whether the relation between data (pre-calculated xc ker- 
nel) and subroutines manipulating it (fxcQspace etc.) 
can be explicitly expressed through native constructs of 
the programming language. This expressiveness of OOP 
compared to imperative programming allows more error- 
checking to be delegated to the compiler, reducing pos- 
sible human error. 

The polymorphism of fxc also helps when doing triplet 
calculations. As shown in Sect. IB, singlet/triplet exci- 
tations for spin-unpolarized system can be calculated by 
using different xc kernels as in Eq. (4). We derive the 
abstract f xc_spin class from fxc, which contains an ad- 
ditional singlet /triplet mode switch. Since fxc_spin is 
an fxc, it is used in the same way for evaluating Eq. 
(3). One obtains both singlet and triplet results by do- 
ing two successive calculations with the corresponding 
mode of f xc_spin. The code calculating Eq. (3) neither 
knows that the calculation is spin-dependent, nor can a 
bug change the mode accidentally. 

D. Hamiltonian 

We define the Hamiltonian class to construct the ma- 
trices in Eqs. (2) and (3) and to solve the eigenvalue 
problem Eq. (1). The storage and I/O of data is one 
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Wrapper for BLAC5, the communication 
library used by ScaLAPACK 



FIG. 3. UML 19 class diagram of ComplexMatrix and 
LocalComplexMatrix. Only important methods and member 
variables are shown. 



represent the local part of a distributed matrix. The code 
in ComplexMatrix is thus reused for operations not di- 
rectly related to the distributed nature, such as adding 
two matrices. To solve the eigenvalue problem Eq. (1) 
represented in a LocalComplexMatrix object, we define 
an abstract EigenSolver class, whose derived classes 
package the actual ScaLAPACK subroutine calls. By 
doing so the Hamiltonian class is decoupled from the 
representation details of the distributed matrix. 

The I/O of the distributed matrices is problematic, be- 
cause simply saving the data on each node to disk forbids 
using a different number of nodes when reusing this data. 
We use MPI to deal with the I/O problem. ScaLAPACK 
uses a block-cyclic format for distributed matrices, which 
is supported by the MPI 2.0 standard as a representable 
data type. The MPI parallel I/O interface also contains 
unrelated details for the Hamiltonian class, so we pack- 
age them into the HamillnOutAdapter class. In the end, 
the Hamiltonian class only determines what calculations 
need to be done, but does not depend on how they are 
actually carried out. In this way, the code achieves a high 
degree of flexibility and extensibility. 



of the main problems. The matrices have dimension 
N v x N c x 2Vk, where N v is the number of valence bands, 
N c is the number of conduction bands, and Nk is the 
number of k-points in the full Brillouin zone. For a con- 
verged calculation, the number of matrix elements must 
be at least of the order of 10 8 . Hamiltonian thus contains 
hundreds of gigabytes of data in total. Due to hardware 
limitations, we are forced to use distributed storage and 
parallel algorithms to handle these matrices. 

The ScaLAPACK library 21 nicely suits our needs. 
It provides expertly tuned parallel algorithms for dis- 
tributed matrices, but its interface exposes many imple- 
mentation details that do not fit into the OOP frame- 
work. For example, the local storage format of the dis- 
tributed matrix must be provided when calling most of 
the subroutines of ScaLAPACK. If Hamiltonian had to 
manage such details, it would create lots of duplicate 
code, which would be hard to maintain. Changing to 
other libraries would then involve major rewriting of the 
Hamiltonian class. We package the use of numerical li- 
braries in the ComplexMatrix and LocalComplexMatrix 
classes (Fig. 3). 

We use ComplexMatrix to store a matrix on one pro- 
cess: it allocates the necessary memory through a pro- 
vided object of the MatrixAllocator class, it gives ac- 
cess to the matrix elements, and it provides commonly 
used operations such as adding two matrices. By wrap- 
ping several BLAS 29 and LAPACK 30 subroutines for 
computation-heavy tasks, ComplexMatrix not only pro- 
vides numerical efficiency, but also hides the representa- 
tion of the matrix, making substituting BLAS /LAPACK 
with other libraries possible. 

For the use of matrices in the Hamiltonian class, 
we derive LocalComplexMatrix from ComplexMatrix to 



IV. CONCLUSION 

We have presented and examined our code for the re- 
cent study of exciton binding energies with TDDFT. 9 
We provide our code as a case-study showing the advan- 
tages of OOP and promoting the use of OOP in scientific 
programming. The source code is made available as sup- 
plemental material, 10 but it is not intended to be used 
as a black box; it only serves as an example to illustrate 
the power of OOP and to provide the background of this 
work. 

Compared with imperative programming, OOP helps 
to analyze and model the computational problem at a 
level closer to the actual physical problem. By provid- 
ing an explicit connection between data and operations 
manipulating the data (through programming language 
support), the structure of the program becomes loosely- 
coupled, i.e., changing one part of the code does not affect 
the behavior of other parts. Consequently the mainte- 
nance of the program becomes easier. 

The versatility and expressiveness of OOP allows del- 
egating many types of error-checking to the compiler, 
allowing earlier detection of bugs and reduction of hu- 
man error. Encapsulating irrelevant details inside ob- 
jects frees physicists from being entangled by implemen- 
tation details and instead allows concentrating on the 
physical problem at hand. Like any other programming 
paradigms, OOP itself does not guarantee a good pro- 
gram, but the abstraction provided by it helps achieving 
higher programming quality. 
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